The "Students Performance in Exams" dataset includes data on the performance of students in exams. It contains data for math, reading, and writing exams, as well as various background information on the students such as their gender, race/ethnicity, parental level of education, and lunch status (whether they receive a standard or reduced lunch). The data is presented in a table with each row representing a different student and each column representing a different attribute or feature. There are 1000 rows (students) in the dataset and 8 columns (features).
Here is a description of each column:
It's also important to consider other factors that may affect exam performance, such as the students' gender, race/ethnicity, parental level of education, and lunch status. You could control for these variables in your analysis by comparing only students who are similar in these regards (for example, comparing only male students or only students from group A).
It's worth noting that the "Students Performance in Exams" dataset is a relatively small dataset with only 1000 rows, so it may not be representative of the entire population of students. Additionally, the data may not include all relevant factors that could affect exam performance, so it's important to interpret the results of any analysis with caution.
2. It's difficult to determine the exact factors that contribute to test outcomes without more information about the specific context in which the tests were taken. However, some general factors that could potentially affect test performance include:
The student's level of knowledge and understanding of the material being tested
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
x = pd.read_csv("exams (1).csv")
x.head()
| gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
|---|---|---|---|---|---|---|---|---|
| 0 | male | group A | high school | standard | completed | 67 | 67 | 63 |
| 1 | female | group D | some high school | free/reduced | none | 40 | 59 | 55 |
| 2 | male | group E | some college | free/reduced | none | 59 | 60 | 50 |
| 3 | male | group B | high school | standard | none | 77 | 78 | 68 |
| 4 | male | group E | associate's degree | standard | completed | 78 | 73 | 68 |
x.tail()
| gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
|---|---|---|---|---|---|---|---|---|
| 995 | male | group C | high school | standard | none | 73 | 70 | 65 |
| 996 | male | group D | associate's degree | free/reduced | completed | 85 | 91 | 92 |
| 997 | female | group C | some high school | free/reduced | none | 32 | 35 | 41 |
| 998 | female | group C | some college | standard | none | 73 | 74 | 82 |
| 999 | male | group A | some college | standard | completed | 65 | 60 | 62 |
#draw a count plot on our data to see the frequencey of each value in each colum
sns.countplot(x= x['gender'])
<AxesSubplot:xlabel='gender', ylabel='count'>
#group countplot
sns.countplot(x= x['gender'], hue=x['parental level of education'])
<AxesSubplot:xlabel='gender', ylabel='count'>
#group countplot
sns.countplot(x= x['gender'], hue=x['test preparation course'])
plt.title('Countplot for Exame performance',pad=20, fontsize=15)
Text(0.5, 1.0, 'Countplot for Exame performance')
#group countplot
sns.countplot(x= x['gender'], hue=x['lunch'],saturation = 1, palette = 'colorblind')
<AxesSubplot:xlabel='gender', ylabel='count'>
#group countplot
sns.countplot(y= x['gender'], hue=x['lunch'],saturation = 1, palette = 'Accent')
#saving a plot
plt.savefig('cont_plot.pdf')
#how to draw a scatter plot
sns.scatterplot(data = x, x='math score',y='reading score')
<AxesSubplot:xlabel='math score', ylabel='reading score'>
#how to draw a scatter plot
sns.scatterplot(data = x, x='math score',y='writing score',hue='lunch',palette=['Green','red'])
<AxesSubplot:xlabel='math score', ylabel='writing score'>
#how to draw a scatter plot
sns.scatterplot(data = x, x='reading score',y='writing score',hue='gender')
<AxesSubplot:xlabel='reading score', ylabel='writing score'>
#how to draw a scatter plot
sns.scatterplot(data = x, x='writing score',y='math score',hue='test preparation course',palette=['Green','darkviolet'])
#count plot to show the coordinates of the graph,points (x,y)
<AxesSubplot:xlabel='writing score', ylabel='math score'>
# make a scatter plot with matplotlib
plt.scatter(x['math score'], x['reading score'], marker='*' ,c=x['writing score']) #marker function use to change the icon,s
# adding labels
plt.xlabel("math score")
plt.ylabel("reading score")
plt.title("Scatterplot For math score Vs writing score")
plt.colorbar()
plt.show()
plt.scatter(x['gender'], x['reading score'], marker='*' ,c=x['writing score']) #marker function use to change the icon,s
# adding labels
plt.xlabel("math score")
plt.ylabel("reading score")
plt.title("Scatterplot For math score Vs writing score")
plt.colorbar()
plt.show()
#create a boxplot
sns.boxplot(data = x,y='math score',showmeans= True,)
<AxesSubplot:ylabel='math score'>
# create a boxplot
sns.boxplot(data = x,y='writing score',palette='Dark2')
<AxesSubplot:ylabel='writing score'>
# create a boxplot
sns.boxplot(data = x,y='reading score',palette='Set1')
<AxesSubplot:ylabel='reading score'>
# create a boxplot
sns.boxplot(data = x, x='gender',y='math score',showmeans= True,)
<AxesSubplot:xlabel='gender', ylabel='math score'>
# create a boxplot
sns.boxplot(data = x, x='gender',y='math score',showmeans= True,)
sns.swarmplot(data = x, x='gender',y='math score', size=3, color='black')
<AxesSubplot:xlabel='gender', ylabel='math score'>
#histplot
sns.histplot(data=x, x='math score')
<AxesSubplot:xlabel='math score', ylabel='Count'>
#histplot
sns.histplot(data=x, x='reading score', palette='Antique', binwidth=1)
<AxesSubplot:xlabel='reading score', ylabel='Count'>
#histplot
sns.histplot(data=x, x='writing score', palette='o3')
<AxesSubplot:xlabel='writing score', ylabel='Count'>
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',kind='boxen')
<seaborn.axisgrid.FacetGrid at 0x2308c1e9eb0>
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',hue='parental level of education',kind='boxen')
<seaborn.axisgrid.FacetGrid at 0x2308c1f2520>
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',hue='race/ethnicity',kind='boxen')
<seaborn.axisgrid.FacetGrid at 0x2308c26feb0>
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',hue='gender',kind='boxen')
<seaborn.axisgrid.FacetGrid at 0x2308c1b2bb0>
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',hue='race/ethnicity', kind='bar', capsize=0.1) #capesize function to use for error bar with capes.
<seaborn.axisgrid.FacetGrid at 0x2308c1f5ca0>
sns.catplot(data = x, x='gender',y='reading score',hue='lunch',kind='bar', capsize=0.1, ci=95)#use ci for confidence interval mean that 95% shows your result is right.
<seaborn.axisgrid.FacetGrid at 0x2308bd427f0>
sns.catplot(data = x, x='gender',y='reading score',hue='lunch',kind='bar', capsize=0.1, ci=95, col='test preparation course')
#using col function to show the separate graphs on the bases of test preparation course.
<seaborn.axisgrid.FacetGrid at 0x23086353a60>
sns.catplot(data = x, x='gender',y='reading score',hue='parental level of education',kind='bar', capsize=0.1)
<seaborn.axisgrid.FacetGrid at 0x2308c27e0a0>
#how ot create violin plot
#kind function to use for create different kind of plot
#hue function to use for compressions.
sns.catplot(data = x, x='gender',y='reading score',hue='race/ethnicity',kind='violin')
<seaborn.axisgrid.FacetGrid at 0x2308c26ffd0>
#how ot create violin plot
#kind function to use for create different kind of plot
sns.catplot(data = x, x='race/ethnicity',y='math score',hue='gender',kind='violin')
<seaborn.axisgrid.FacetGrid at 0x2308bd5d7c0>
#how ot create violin plot
sns.catplot(data = x, x='lunch',y='writing score',hue='parental level of education',kind='violin')
<seaborn.axisgrid.FacetGrid at 0x2308e2cb670>
#how ot create violin plot
#using col function to septate for those variables male or female.
sns.catplot(data = x, x='lunch',y='writing score',hue='parental level of education',kind='violin',col='gender')
<seaborn.axisgrid.FacetGrid at 0x2308e258bb0>
#how ot create violin plot
#using col function to septate for those variables male or female.
sns.catplot(data = x, x='gender',y='reading score',hue='test preparation course',kind='violin',col='lunch')
<seaborn.axisgrid.FacetGrid at 0x2308dc754f0>
# *Group violin plot*
sns.catplot(data=x, x= 'gender', y='math score', col='parental level of education', kind='violin')
<seaborn.axisgrid.FacetGrid at 0x2308e33e460>
# how to create swarm plot
sns.catplot(data = x, x='gender',y='reading score',hue='test preparation course',kind='violin',col='lunch')
sns.swarmplot(data = x, x='gender',y='reading score',size=2)
<AxesSubplot:title={'center':'lunch = free/reduced'}, xlabel='gender', ylabel='reading score'>
sns.histplot(data=x, x="math score", kde=True)
<AxesSubplot:xlabel='math score', ylabel='Count'>
sns.histplot(data=x, x="reading score", kde=True)
<AxesSubplot:xlabel='reading score', ylabel='Count'>
sns.histplot(data=x, x="writing score", kde=True)
<AxesSubplot:xlabel='writing score', ylabel='Count'>
# How to plot a line plot
sns.lineplot(data=x, x='math score', y='reading score')
<AxesSubplot:xlabel='math score', ylabel='reading score'>
# How to plot a line plot
sns.lineplot(data=x, x='writing score', y='reading score', color ='Green')
<AxesSubplot:xlabel='writing score', ylabel='reading score'>
# How to plot a line plot
sns.lineplot(data=x, x='writing score', y='math score', color = 'red')
<AxesSubplot:xlabel='writing score', ylabel='math score'>
# How to plot a line plot
sns.lineplot(data=x, x='writing score', y='math score', hue='gender', palette=['#8403fc','#fc03d3'])
<AxesSubplot:xlabel='writing score', ylabel='math score'>
# How to plot a line plot
sns.lineplot(data=x, x='writing score', y='math score', hue='parental level of education')
<AxesSubplot:xlabel='writing score', ylabel='math score'>
# How to plot a line plot
sns.lineplot(data=x, x='writing score', y='math score', hue='race/ethnicity')
<AxesSubplot:xlabel='writing score', ylabel='math score'>
import seaborn as sns
sns.stripplot(data= x, x= "lunch", y= "math score", jitter=True, hue="gender")
plt.show()
sns.lmplot( data=x, x='math score',y='reading score',hue='parental level of education',row='gender', palette="Set1")
<seaborn.axisgrid.FacetGrid at 0x23091d1ecd0>
sns.lmplot( data=x, x='math score',y='reading score',hue='parental level of education',row='gender', palette="Set1")
<seaborn.axisgrid.FacetGrid at 0x23092f27c40>
sns.lmplot( data=x, x='math score',y='reading score',hue='race/ethnicity',row='gender', palette="Set1")
<seaborn.axisgrid.FacetGrid at 0x23093bdca90>
sns.jointplot(data=x, x='gender', y='math score', kind='scatter',palette='Set1')
<seaborn.axisgrid.JointGrid at 0x2309303c6a0>
sns.jointplot(data=x, x='lunch', y='writing score', kind='scatter',color='#34eb46')
<seaborn.axisgrid.JointGrid at 0x23093bedc10>
sns.set_theme(style="darkgrid")
sns.kdeplot(data=x, x='math score',y='reading score')
<AxesSubplot:xlabel='math score', ylabel='reading score'>
sns.set_theme(style="darkgrid")
sns.kdeplot(data=x, x='math score',y='writing score')
<AxesSubplot:xlabel='math score', ylabel='writing score'>
sns.set_theme(style="darkgrid")
sns.kdeplot(data=x, x='reading score',y='writing score')
<AxesSubplot:xlabel='reading score', ylabel='writing score'>
import pandas as pd
x = pd.read_csv("exams (1).csv")
x.tail()
| gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
|---|---|---|---|---|---|---|---|---|
| 995 | male | group C | high school | standard | none | 73 | 70 | 65 |
| 996 | male | group D | associate's degree | free/reduced | completed | 85 | 91 | 92 |
| 997 | female | group C | some high school | free/reduced | none | 32 | 35 | 41 |
| 998 | female | group C | some college | standard | none | 73 | 74 | 82 |
| 999 | male | group A | some college | standard | completed | 65 | 60 | 62 |
x.head()
| gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
|---|---|---|---|---|---|---|---|---|
| 0 | male | group A | high school | standard | completed | 67 | 67 | 63 |
| 1 | female | group D | some high school | free/reduced | none | 40 | 59 | 55 |
| 2 | male | group E | some college | free/reduced | none | 59 | 60 | 50 |
| 3 | male | group B | high school | standard | none | 77 | 78 | 68 |
| 4 | male | group E | associate's degree | standard | completed | 78 | 73 | 68 |
x.sample(20)
| gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
|---|---|---|---|---|---|---|---|---|
| 948 | male | group E | some high school | free/reduced | completed | 49 | 50 | 45 |
| 320 | male | group E | high school | free/reduced | completed | 58 | 56 | 49 |
| 145 | female | group B | high school | free/reduced | none | 51 | 57 | 57 |
| 753 | male | group D | associate's degree | free/reduced | none | 61 | 59 | 54 |
| 63 | male | group A | bachelor's degree | standard | completed | 77 | 82 | 78 |
| 985 | male | group E | associate's degree | standard | none | 74 | 73 | 67 |
| 944 | female | group C | associate's degree | standard | completed | 57 | 72 | 73 |
| 522 | male | group D | associate's degree | standard | none | 78 | 74 | 71 |
| 177 | male | group B | some high school | free/reduced | none | 55 | 53 | 51 |
| 238 | male | group B | associate's degree | standard | none | 76 | 62 | 64 |
| 740 | male | group B | some college | standard | none | 52 | 53 | 51 |
| 495 | male | group E | some college | standard | none | 78 | 65 | 60 |
| 147 | male | group E | some college | standard | none | 60 | 55 | 46 |
| 807 | female | group D | master's degree | standard | none | 61 | 72 | 64 |
| 959 | male | group D | master's degree | standard | completed | 91 | 84 | 83 |
| 142 | female | group D | bachelor's degree | standard | none | 86 | 83 | 87 |
| 586 | female | group B | master's degree | standard | completed | 59 | 73 | 74 |
| 918 | male | group C | some high school | standard | none | 72 | 68 | 67 |
| 569 | female | group B | some college | standard | none | 70 | 73 | 66 |
| 798 | male | group C | some college | standard | none | 56 | 55 | 50 |
# show the data for 50% for your data.
x.sample(frac=0.5)
| gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
|---|---|---|---|---|---|---|---|---|
| 577 | male | group C | master's degree | standard | none | 68 | 60 | 62 |
| 344 | male | group B | high school | standard | completed | 80 | 80 | 79 |
| 676 | male | group E | associate's degree | free/reduced | none | 63 | 61 | 62 |
| 691 | female | group D | associate's degree | free/reduced | none | 68 | 84 | 80 |
| 848 | male | group E | high school | standard | none | 94 | 89 | 86 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 788 | male | group D | high school | standard | none | 61 | 62 | 60 |
| 779 | male | group C | associate's degree | free/reduced | none | 38 | 51 | 47 |
| 964 | male | group E | bachelor's degree | standard | none | 100 | 83 | 86 |
| 927 | male | group C | bachelor's degree | free/reduced | completed | 61 | 71 | 65 |
| 250 | female | group D | some college | free/reduced | none | 53 | 70 | 71 |
500 rows × 8 columns
# to khow the data row and colums
x.shape
(1000, 8)
# to know the data types
x.info
<bound method DataFrame.info of gender race/ethnicity parental level of education lunch \
0 male group A high school standard
1 female group D some high school free/reduced
2 male group E some college free/reduced
3 male group B high school standard
4 male group E associate's degree standard
.. ... ... ... ...
995 male group C high school standard
996 male group D associate's degree free/reduced
997 female group C some high school free/reduced
998 female group C some college standard
999 male group A some college standard
test preparation course math score reading score writing score
0 completed 67 67 63
1 none 40 59 55
2 none 59 60 50
3 none 77 78 68
4 completed 78 73 68
.. ... ... ... ...
995 none 73 70 65
996 completed 85 91 92
997 none 32 35 41
998 none 73 74 82
999 completed 65 60 62
[1000 rows x 8 columns]>
x.describe()
| math score | reading score | writing score | |
|---|---|---|---|
| count | 1000.000000 | 1000.000000 | 1000.000000 |
| mean | 66.396000 | 69.002000 | 67.738000 |
| std | 15.402871 | 14.737272 | 15.600985 |
| min | 13.000000 | 27.000000 | 23.000000 |
| 25% | 56.000000 | 60.000000 | 58.000000 |
| 50% | 66.500000 | 70.000000 | 68.000000 |
| 75% | 77.000000 | 79.000000 | 79.000000 |
| max | 100.000000 | 100.000000 | 100.000000 |
x.isnull().values.any()
False
#useful method if value_counts() which can get count of each category in a categorical attributed series of values.
x["math score"].value_counts()
63 34
71 30
77 30
74 28
57 27
..
26 2
23 1
29 1
34 1
25 1
Name: math score, Length: 77, dtype: int64
#group buy is an interesting measure available means.
x.groupby(['math score','gender']).mean()
| reading score | writing score | ||
|---|---|---|---|
| math score | gender | ||
| 13 | female | 32.500000 | 30.000000 |
| 23 | female | 44.000000 | 44.000000 |
| 25 | female | 36.000000 | 37.000000 |
| 26 | female | 39.000000 | 37.000000 |
| 28 | female | 41.000000 | 40.500000 |
| ... | ... | ... | ... |
| 97 | male | 87.000000 | 87.000000 |
| 98 | male | 89.000000 | 87.333333 |
| 99 | male | 86.000000 | 89.666667 |
| 100 | female | 100.000000 | 100.000000 |
| male | 89.454545 | 90.181818 |
140 rows × 2 columns
x.duplicated().sum()
1
x['math score'].unique()
x['reading score'].unique()
x['writing score'].unique()
array([ 63, 55, 50, 68, 76, 84, 65, 45, 85, 90, 73, 57, 42,
44, 31, 88, 54, 32, 56, 60, 89, 51, 77, 39, 71, 74,
75, 72, 64, 82, 70, 87, 78, 49, 47, 62, 83, 48, 59,
97, 81, 67, 69, 61, 93, 100, 53, 79, 58, 33, 86, 66,
46, 80, 91, 92, 95, 99, 96, 28, 52, 24, 40, 43, 94,
23, 38, 30, 35, 41, 98, 36, 27, 26, 34, 37], dtype=int64)
# to know the unique values in each columns.
x.nunique()
gender 2 race/ethnicity 5 parental level of education 6 lunch 2 test preparation course 2 math score 77 reading score 73 writing score 76 dtype: int64
x.isnull().sum()
gender 0 race/ethnicity 0 parental level of education 0 lunch 0 test preparation course 0 math score 0 reading score 0 writing score 0 dtype: int64
#Datatypes
x.dtypes
gender object race/ethnicity object parental level of education object lunch object test preparation course object math score int64 reading score int64 writing score int64 dtype: object
#Filter data
x[x['math score']==50].head()
| gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
|---|---|---|---|---|---|---|---|---|
| 175 | female | group D | some college | free/reduced | none | 50 | 56 | 60 |
| 196 | male | group D | bachelor's degree | standard | none | 50 | 46 | 48 |
| 200 | female | group D | associate's degree | standard | completed | 50 | 63 | 65 |
| 296 | male | group D | some high school | free/reduced | none | 50 | 47 | 46 |
| 312 | male | group C | associate's degree | free/reduced | none | 50 | 48 | 43 |
x[x['reading score'] <50]
| gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
|---|---|---|---|---|---|---|---|---|
| 9 | male | group C | some college | free/reduced | none | 47 | 42 | 45 |
| 16 | male | group B | high school | standard | none | 58 | 47 | 42 |
| 18 | female | group C | associate's degree | free/reduced | none | 23 | 44 | 44 |
| 19 | male | group C | some college | free/reduced | none | 39 | 32 | 31 |
| 24 | male | group E | some high school | free/reduced | none | 46 | 38 | 32 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 938 | male | group A | high school | free/reduced | none | 45 | 33 | 32 |
| 962 | male | group B | some high school | standard | completed | 46 | 46 | 44 |
| 976 | female | group B | some college | free/reduced | completed | 31 | 29 | 35 |
| 981 | male | group C | some college | standard | none | 64 | 48 | 48 |
| 997 | female | group C | some high school | free/reduced | none | 32 | 35 | 41 |
103 rows × 8 columns
#Boxplot
x[['reading score']].boxplot()
<AxesSubplot:>
#Show the relationship between variables
x.corr()
| math score | reading score | writing score | |
|---|---|---|---|
| math score | 1.000000 | 0.819398 | 0.805944 |
| reading score | 0.819398 | 1.000000 | 0.954274 |
| writing score | 0.805944 | 0.954274 | 1.000000 |
# you can even visualize the correlation matrix using "seabor library"
sns.heatmap(x.corr())
<AxesSubplot:>